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We present a novel technique to parametrize experimental data, based on the construction of a probability 
measure in the space of functions, which retains the full experimental information on errors and correlations. This 
measure is constructed in a two step process: first, a Monte Carlo sample of replicas of the experimental data is 
generated, and then an ensemble of neural network is trained over them. This parametrization does not introduce 
any bias due to the choice of a fixed functional form. Two applications of this technique are presented. First 
a probability measure in the space of the spectral function pv-a(s) is generated, which incorporates theoretical 
constraints as chiral sum rules, and is used to evaluate the vacuum condensates. Then we construct a probability 
measure in the space of the proton structure function F^x^Q 2 ), which updates previous work, incorporating 
HERA data. 



1. Introduction and motivation 

The general problem we are considering here 1 is 
that of the parametrization of experimental data. 
This is an ill-posed problem, since it consists on 
obtaining continuous functions from a finite set 
of measurements. Standard parameterizations, 
which consist on choosing a functional form and 
fitting its parameters to the data, have a series 
of shortcomings: first, the choice of a functional 
form introduces an a priori bias, which implies a 
theoretical uncertainty whose size is very difficult 
to asses. Another important problem is how er- 
rors and correlations are represented within this 
parametrization, and how uncertainties are prop- 
agated to other observables, since linear error 
propagation is not trustable in general. 

The motivation for this problem is the issue 
of Parton Distributions Functions (PDFs) for the 
LHC [1]. Recently, a considerable amount of the- 
oretical and experimental effort has been invested 
in their accurate determination, and in particu- 
lar their associate errors, in view of the accurate 
computation of collider processes and determina- 
tion of QCD parameters. On the theory side, 



the PDFs should be unbiased with respect to the 
choice of functional form, and moreover the full 
experimental information should be incorporated 
into the PDFs parametrization, including system- 
atic errors and correlations in a way that allows 
it to propagate to observables (like cross sections) 
without introducing an additional bias, linear ap- 
proximations for instance. The technique that we 
present here is specially devised to fulfill all these 
requirements. 

2. General strategy 

In this section we review the approach we take 
to this problem [2] . The basic idea is to construct 
a probability measure in the space of functions, 
V [/], from experimental information for /. From 
this probability measure one can compute any ob- 
servable and the associated uncertainty and cor- 
relation, using weighted averages 



[/(*)]>= VfT[f{x)]V[f] 



x Talk given at the High-Energy Physics International 
Conference on Quantum Chromodynamics at Montpellier 
(France), 5-9 July 2004. Based on the work of Refs. [4,9]. 
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The way this idea is implemented in our formal- 
ism is using neural networks as universal unbiased 
intcrpolants, in a two step process. The first step 
is the Monte Carlo sampling, where we generate 
a number of replicas of the artificial. Then we 
train an ensemble of neural networks over these 
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generated replicas. The whole procedure is then 
validated through suitable statistical estimators. 

The first step is the Monte Carlo sampling of 
experimental data, which consists on the gener- 
ation of N rep Monte Carlo sets of 'pseudo-data', 
replicas of the original Ndat data points, j.( art )( fc ) ; 
z = l,..., Ndat,k — 1, . . . , N rep using equations of 
the form 



(art)(k) 



where af at and al ys ' 1 are the statistical and 
the different systematic errors of the point i and 
A at = ^1 + r^fJN^j is the contribution from the 
normalization error. The rj are univariate gaus- 
sian random numbers. Correlated systematics 
share the same random numbers, and this takes 
into account the correlations in the generation of 
the replicas. 

The second part of our technique consists on 
training one neural network [3] on each Monte 
Carlo replica of the experimental data 2 . Neural 
networks are useful for our purposes since they 
are the most unbiased prior, and they are ro- 
bust, unbiased universal approximants, so using 
them eliminates the need of introducing a bias by 
choosing a functional form for our parametriza- 
tion. We use a combination of two different tech- 
niques for training the networks: Backpropaga- 
tion (BP) learning and Genetic Algorithms (GA) 
learning. The second technique, inspired in evo- 
lutionary models in in biology, has been widely 
used in other branches of science. In this con- 
text training means the minimization of an error 
function evaluated with the covariance matrix for 
each replica 
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where Ajf > = f\ art){k) - f^ k \ so that this 
error function measures the goodness of the fit. 
The set of trained nets is the sought-for proba- 
bility measure in the space of functions /, and 

2 In Rcfs. [2,4] there is a detailed description of the neu- 
ral networks and the learning algorithms used in this 
technique 



defines a parametrization of the experimental in- 
formation for /. Averages over this probability 
measure are performed using 
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The final step consists on the validation of the 
fitting process using statistical estimators. 

3. Spectral functions 

Now we turn to consider the first of the two 
applications [4] of this technique. The vector and 
axial-vector spectral functions vi(a),ai(s) have 
been measured in hadronic tau decays at LEP 
(ALEPH [5] and OPAL) up to s = M 2 with large 
precision except near threshold. In particular we 
are interested in parameterizing the vector-axial 
vector spectral function Pv-a(s) = v i ( s ) — a i (s) ■ 
This spectral function is interesting since it van- 
ishes to all orders in perturbation theory, and 
thus is specially suited to study nonperturbative 
aspects of QCD. In particular this spectral func- 
tion is an order parameter of spontaneous chiral 
symmetry breaking. 

The combined use of the operator product ex- 
pansion and dispersion relations [6] allows an ex- 
traction of the QCD vacuum condensates in terms 
of convolutions of this spectral function, 
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dss n ^pv-A{s) ■ (5) 



However, there are problems with experimental 
data to obtain a clean extraction. First of all, we 
have information only up to the tau mass kine- 
matic threshold s = M 2 , and second, there are 
large errors and correlations near this threshold 
due to phase space suppression factors. One pos- 
sible solution consists on constructing a proba- 
bility measure in the space of spectral functions 
Pv-a(s) using the general technique introduced 
above and then use the chiral sum rules to con- 
strain the large s behavior within this probability 
measure. 

Now we proceed as explained in Section 2. The 
only difference in the GA training epoch consists 
on the minimization for each replica of a modified 
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error function, 



dsfi(pP_ A ) 



which takes into account both the theoretical con- 
straints from the chiral sum rules (second term) 
and the asymptotic constraint pv~a(s — ► oo) = 
(third term) . The chiral sum rules [7] that are in- 
corporated into the probability measure are the 
Das-Mathur-Okubo sum rule, the first and the 
second Weinberg sum rules (WSR), and the elec- 
tromagnetic mass splitting of the pion sum rule 3 . 
The use of GA is crucial here since allows the 
learning of non-local error functions like Eq. 6. 

So using this technique the probability mea- 
sure for pv-a(s) is constructed, and from 
this measure we can evaluate any observable 
with the corresponding uncertainty The re- 
sults for the lower dimensional condensates are 
(C 6 ) = (-4.0 ±2.0) 10- 3 GcV 6 and (0 8 ) = 
(-12 +£) 10~ 3 GcV 8 (see Fig. 1). While the 
result of (Oe) is standard, the sign of (C 8 ) os- 
cillates in the literature 4 [8]. We find that the 
uncertainties in the determination of the conden- 
sates are often underestimated and dominated by 
theoretical (model-dependent) uncertainties. 

4. Structure functions 

Now we present the second application [9], a 
paramctrization of the proton structure function 
F%(x, Q 2 ), which is an update of Ref. [2], with the 
following novel features: incorporation of 11 more 
experiments (E665, HI and ZEUS, in addition to 
NMC and BCDMS), direct minimization of co- 
variance matrix error function and an improved 
analysis of experimental uncertainties (like a ded- 
icated treatment of asymmetric errors and uncor- 
rected systematics) . Note that a single fit covers 
the whole kinematical range, 6 10" 7 < x < 0.8, 
4.5 10~ 2 < Q 2 < 3 10 4 GeV 2 , which consists on 
regions with very different behaviour, since we do 
not need to supply any functional form. 

3 It turns to be that the most relevant are the two WSRs, 
J °° ds p V -a(s) = 4vr 2 /2 and ds sp V -A(s) = 
4 See Ref. [4] for the details of this paramctrization, a care- 
ful discussion on the sign of the dimension 8 condensate 
and a comparison with other determinations 




Figure 1. The (Oq) condensate function of 
so- Error bands show propagation of experimen- 
tal uncertainties. 



Following the steps of Section 2, we obtain a 
parametrization for F^XjQ 2 ). In Fig. 2 one can 
observe our parametrization compared with ex- 
perimental data. Note that the uncertainties in 
F^XjQ 2 ) are automatically incorporated in the 
parametrization. Note also a relevant feature: 
in regions without experimental data, the uncer- 
tainties increases in a very characteristic way, so 
the region where the parametrization ceases to be 
trustable is under control. 

It is interesting to study the effect of the incor- 
poration of new experiments by comparing our 
parametrization with the old version of F2neural 
[2], where the only experimental data was from 
the NMC and BCDMS experiments (see Fig. 3). 
Note that the two fits are consistent in the region 
with same experimental data (high x region) but 
differ at low x where the new fit is better, as was 
expected since incorporates the HERA data. 

5. Conclusions and future work 

We have presented a general technique to 
parametrize experimental data, applying it to two 
different problems. We have constructed a prob- 
ability measure in the space of spectral functions 
and structure functions showing that additional 
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Figure 2. Comparison of the F2neural fit with ex- 
perimental data. Only diagonal errors are shown, 
point-to-point correlations might be large. 
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Figure 3. Comparison of old and new versions 
of the F2neural fit. Note the characteristic in- 
crease of uncertainty in regions without experi- 
mental data 



theoretical constraints (sum rules, kinematical 
constraints) can be incorporated in the probabil- 
ity measure. These two applications increase the 
confidence on the validity of our approach. The 
next step is to construct a probability measure in 
the space of PDFs, with full control over experi- 
mental uncertainties, accurate error propagation 
and no bias due to the choice of fixed functional 
forms. 
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